Supporting SQL-3 Aggregations on Grid-Based Data Repositories

نویسندگان

  • Li Weng
  • Gagan Agrawal
  • Ümit V. Çatalyürek
  • Joel H. Saltz
چکیده

There is an increasing trends towards distributed and shared repositories for storing scientific datasets. Developing applications that retrieve and process data from such repositories involves a number of challenges. First, these data repositories store data in complex, low-level layouts, which should be abstracted from application developers. Second, as data repositories are shared resources, part of the computations on the data must be performed at a different set of machines than the ones hosting the data. Third, because of the volume of data and the amount of computations involved, parallel configurations need to be used for both hosting the data and the processing on the retrieved data. In this paper, we describe a system for executing SQL-3 queries over scientific data stored as flatfiles. A relational table-based virtual view is supported on these flat-file datasets. The class of queries we consider involve data retrieval using Select and Where clauses, and processing with user-defined aggregate functions and group-bys. We use a middleware system STORM for providing much of the low-level functionality. Our compiler analyzes the SQL-3 queries and generates many of the functions required by this middleware. Our experimental results show good scalability with respect to the number of nodes as well as the dataset size.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transparency in Object-Oriented Grid Database Systems

The paper presents various transparency issues that have to be considered during development of object-oriented Grid applications based on virtual repositories. Higher-level transparencies, such as location, heterogeneity, fragmentation, replication, redundancy, indexing and service provider transparency assure new information processing culture greatly supporting the development, operation and...

متن کامل

FED - A Framework for Iterative Data Selection in Exploratory Visualization

This paper presents a paradigm for the interactive selection (querying) of data from a structured grid of data points for exploratory visualization. The paradigm is based on specifying and iteratively adjusting the Focus, Extent, and Density (FED) of the data attributes. The FED model supports highly complex queries of structured data in an intuitive fashion, and is augmented with a visual inte...

متن کامل

Prepare and Optimize Data Sets for Data Mining Analysis

Getting ready a data set for examination is usually the tedious errand in a data mining task, needing numerous complex SQL queries, joining tables and conglomerating sections. Existing SQL aggregations have limitations to get ready data sets since they give back one section for every amassed bunch. As a rule, a significant manual exertion is obliged to construct data sets, where a horizontal la...

متن کامل

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Data mining is widely used domain for extracting trends or patterns from historical data. However, the databases used by enterprises can’t be directly used for data mining. It does mean that Data sets are to be prepared from real world database to make them suitable for particular data mining operations. However, preparing datasets for analyzing data is tedious task as it involves many aggregat...

متن کامل

Consistent Aggregations in Databases with Referential Integrity Errors

A data warehouse integrates tables coming from multiple source databases, where each database has different tables, columns with similar content across databases and different referential integrity constraints, enforced to different compliance levels. Some source databases may have more reliable data than others, if referential integrity is more strictly enforced or their respective logical dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004